Parsimonious Higher-Order Hidden Markov Models for Improved Array-CGH Analysis with Applications to Arabidopsis thaliana

نویسندگان

  • Michael Seifert
  • André Gohr
  • Marc Strickert
  • Ivo Grosse
چکیده

Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Array-based Genome Comparison of Arabidopsis Ecotypes using Hidden Markov Models

Abstract: Arabidopsis thaliana is an important model organism in plant biology with a broad geographic distribution including ecotypes from Africa, America, Asia, and Europe. The natural variation of different ecotypes is expected to be reflected to a substantial degree in their genome sequences. Array comparative genomic hybridization (Array-CGH) can be used to quantify the natural variation o...

متن کامل

A Microarray Based Genomic Hybridization Method for Identification of New Genes in Plants: Case Analyses of Arabidopsis and Oryza

To systematically estimate the gene duplication events in closely related species, we have to use comparative genomic approaches, either through genomic sequence comparison or comparative genomic hybridization (CGH). Given the scarcity of complete genomic sequences of plant species, in the present study we adopted an array based CGH to investigate gene duplications in the genus Arabidopsis. Fra...

متن کامل

Hidden Markov models approach to the analysis of array CGH data

The development of solid tumors is associated with acquisition of complex genetic alterations, indicating that failures in the mechanisms that maintain the integrity of the genome contribute to tumor evolution. Thus, one expects that the particular types of genomic alterations seen in tumors reflect underlying failures in maintenance of genetic stability, as well as selection for changes that p...

متن کامل

Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data

MOTIVATION Array Comparative Genomic Hybridization (CGH) can reveal chromosomal aberrations in the genomic DNA. These amplifications and deletions at the DNA level are important in the pathogenesis of cancer and other diseases. While a large number of approaches have been proposed for analyzing the large array CGH datasets, the relative merits of these methods in practice are not clear. RESUL...

متن کامل

P-243: Prenatal Diagnosis Using Array CGH: Case Presentation

Background: Karyotype analysis has been the standard and reliable procedure for prenatal cytogenetic diagnosis since the 1970s. However, the major limitation remains requirement for cell culture, resulting in a delay of as much as 14 days to get the test results.CGH array technology has proven to be useful in detecting causative genomic imbalances or genetic mutations in as many as 15% of child...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2012